Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6594-6601, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-34170823

RESUMEN

In this paper, we consider how to incorporate psychophysical measurements of human visual perception into the loss function of a deep neural network being trained for a recognition task, under the assumption that such information can reduce errors. As a case study to assess the viability of this approach, we look at the problem of handwritten document transcription. While good progress has been made towards automatically transcribing modern handwriting, significant challenges remain in transcribing historical documents. Here we describe a general enhancement strategy, underpinned by the new loss formulation, which can be applied to the training regime of any deep learning-based document transcription system. Through experimentation, reliable performance improvement is demonstrated for the standard IAM and RIMES datasets for three different network architectures. Further, we go on to show feasibility for our approach on a new dataset of digitized Latin manuscripts, originally produced by scribes in the Cloister of St. Gall in the the 9th century.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Escritura Manual , Humanos , Percepción
2.
IEEE Trans Image Process ; 30: 6892-6905, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34288871

RESUMEN

Images from social media can reflect diverse viewpoints, heated arguments, and expressions of creativity, adding new complexity to retrieval tasks. Researchers working on Content-Based Image Retrieval (CBIR) have traditionally tuned their algorithms to match filtered results with user search intent. However, we are now bombarded with composite images of unknown origin, authenticity, and even meaning. With such uncertainty, users may not have an initial idea of what the search query results should look like. For instance, hidden people, spliced objects, and subtly altered scenes can be difficult for a user to detect initially in a meme image, but may contribute significantly to its composition. It is pertinent to design systems that retrieve images with these nuanced relationships in addition to providing more traditional results, such as duplicates and near-duplicates - and to do so with enough efficiency at large scale. We propose a new approach for spatial verification that aims at modeling object-level regions using image keypoints retrieved from an image index, which is then used to accurately weight small contributing objects within the results, without the need for costly object detection steps. We call this method the Objects in Scene to Objects in Scene (OS2OS) score, and it is optimized for fast matrix operations, which can run quickly on either CPUs or GPUs. It performs comparably to state-of-the-art methods on classic CBIR problems (Oxford 5K, Paris 6K, and Google-Landmarks), and outperforms them in emerging retrieval tasks such as image composite matching in the NIST MFC2018 dataset and meme-style imagery from Reddit.

3.
Sci Rep ; 11(1): 1002, 2021 01 13.
Artículo en Inglés | MEDLINE | ID: mdl-33441714

RESUMEN

The analysis of fish behavior in response to odor stimulation is a crucial component of the general study of cross-modal sensory integration in vertebrates. In zebrafish, the centrifugal pathway runs between the olfactory bulb and the neural retina, originating at the terminalis neuron in the olfactory bulb. Any changes in the ambient odor of a fish's environment warrant a change in visual sensitivity and can trigger mating-like behavior in males due to increased GnRH signaling in the terminalis neuron. Behavioral experiments to study this phenomenon are commonly conducted in a controlled environment where a video of the fish is recorded over time before and after the application of chemicals to the water. Given the subtleties of behavioral change, trained biologists are currently required to annotate such videos as part of a study. This process of manually analyzing the videos is time-consuming, requires multiple experts to avoid human error/bias and cannot be easily crowdsourced on the Internet. Machine learning algorithms from computer vision, on the other hand, have proven to be effective for video annotation tasks because they are fast, accurate, and, if designed properly, can be less biased than humans. In this work, we propose to automate the entire process of analyzing videos of behavior changes in zebrafish by using tools from computer vision, relying on minimal expert supervision. The overall objective of this work is to create a generalized tool to predict animal behaviors from videos using state-of-the-art deep learning models, with the dual goal of advancing understanding in biology and engineering a more robust and powerful artificial information processing system for biologists.


Asunto(s)
Conducta Animal/fisiología , Bulbo Olfatorio/fisiología , Pez Cebra/fisiología , Algoritmos , Animales , Computadores , Femenino , Masculino , Neuronas/fisiología , Odorantes , Retina/fisiología
4.
IEEE Trans Pattern Anal Mach Intell ; 43(12): 4272-4290, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-32750769

RESUMEN

What is the current state-of-the-art for image restoration and enhancement applied to degraded images acquired under less than ideal circumstances? Can the application of such algorithms as a pre-processing step improve image interpretability for manual analysis or automatic visual recognition to classify scene content? While there have been important advances in the area of computational photography to restore or enhance the visual quality of an image, the capabilities of such techniques have not always translated in a useful way to visual recognition tasks. Consequently, there is a pressing need for the development of algorithms that are designed for the joint problem of improving visual appearance and recognition, which will be an enabling factor for the deployment of visual recognition tools in many real-world scenarios. To address this, we introduce the UG 2 dataset as a large-scale benchmark composed of video imagery captured under challenging conditions, and two enhancement tasks designed to test algorithmic impact on visual quality and automatic object recognition. Furthermore, we propose a set of metrics to evaluate the joint improvement of such tasks as well as individual algorithmic advances, including a novel psychophysics-based evaluation regime for human assessment and a realistic set of quantitative measures for object recognition performance. We introduce six new algorithms for image restoration or enhancement, which were created as part of the IARPA sponsored UG 2 Challenge workshop held at CVPR 2018. Under the proposed evaluation regime, we present an in-depth analysis of these algorithms and a host of deep learning-based and classic baseline approaches. From the observed results, it is evident that we are in the early days of building a bridge between computational photography and visual recognition, leaving many opportunities for innovation in this area.

5.
Artículo en Inglés | MEDLINE | ID: mdl-32224457

RESUMEN

Existing enhancement methods are empirically expected to help the high-level end computer vision task: however, that is observed to not always be the case in practice. We focus on object or face detection in poor visibility enhancements caused by bad weathers (haze, rain) and low light conditions. To provide a more thorough examination and fair comparison, we introduce three benchmark sets collected in real-world hazy, rainy, and low-light conditions, respectively, with annotated objects/faces. We launched the UG2+ challenge Track 2 competition in IEEE CVPR 2019, aiming to evoke a comprehensive discussion and exploration about whether and how low-level vision techniques can benefit the high-level automatic visual recognition in various scenarios. To our best knowledge, this is the first and currently largest effort of its kind. Baseline results by cascading existing enhancement and detection models are reported, indicating the highly challenging nature of our new data as well as the large room for further technical innovations. Thanks to a large participation from the research community, we are able to analyze representative team solutions, striving to better identify the strengths and limitations of existing mindsets as well as the future directions.

6.
IEEE Trans Image Process ; 29(1): 2150-2165, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31613762

RESUMEN

In this paper we address the problem of hallucinating high-resolution facial images from low-resolution inputs at high magnification factors. We approach this task with convolutional neural networks (CNNs) and propose a novel (deep) face hallucination model that incorporates identity priors into the learning procedure. The model consists of two main parts: i) a cascaded super-resolution network that upscales the low-resolution facial images, and ii) an ensemble of face recognition models that act as identity priors for the super-resolution network during training. Different from most competing super-resolution techniques that rely on a single model for upscaling (even with large magnification factors), our network uses a cascade of multiple SR models that progressively upscale the low-resolution images using steps of 2× . This characteristic allows us to apply supervision signals (target appearances) at different resolutions and incorporate identity constraints at multiple-scales. The proposed C-SRIP model (Cascaded Super Resolution with Identity Priors) is able to upscale (tiny) low-resolution images captured in unconstrained conditions and produce visually convincing results for diverse low-resolution inputs. We rigorously evaluate the proposed model on the Labeled Faces in the Wild (LFW), Helen and CelebA datasets and report superior performance compared to the existing state-of-the-art.


Asunto(s)
Aprendizaje Profundo , Cara/anatomía & histología , Cara/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos , Bases de Datos Factuales , Humanos , Redes Neurales de la Computación
7.
Artículo en Inglés | MEDLINE | ID: mdl-30863298

RESUMEN

We propose a computational model of vision that describes the integration of cross-modal sensory information between the olfactory and visual systems in zebrafish based on the principles of the statistical extreme value theory. The integration of olfacto-retinal information is mediated by the centrifugal pathway that originates from the olfactory bulb and terminates in the neural retina. Motivation for using extreme value theory stems from physiological evidence suggesting that extremes and not the mean of the cell responses direct cellular activity in the vertebrate brain. We argue that the visual system, as measured by retinal ganglion cell responses in spikes/sec, follows an extreme value process for sensory integration and the increase in visual sensitivity from the olfactory input can be better modeled using extreme value distributions. As zebrafish maintains high evolutionary proximity to mammals, our model can be extended to other vertebrates as well.

8.
IEEE Trans Pattern Anal Mach Intell ; 41(9): 2280-2286, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-29994469

RESUMEN

By providing substantial amounts of data and standardized evaluation protocols, datasets in computer vision have helped fuel advances across all areas of visual recognition. But even in light of breakthrough results on recent benchmarks, it is still fair to ask if our recognition algorithms are doing as well as we think they are. The vision sciences at large make use of a very different evaluation regime known as Visual Psychophysics to study visual perception. Psychophysics is the quantitative examination of the relationships between controlled stimuli and the behavioral responses they elicit in experimental test subjects. Instead of using summary statistics to gauge performance, psychophysics directs us to construct item-response curves made up of individual stimulus responses to find perceptual thresholds, thus allowing one to identify the exact point at which a subject can no longer reliably recognize the stimulus class. In this article, we introduce a comprehensive evaluation framework for visual recognition models that is underpinned by this methodology. Over millions of procedurally rendered 3D scenes and 2D images, we compare the performance of well-known convolutional neural networks. Our results bring into question recent claims of human-like performance, and provide a path forward for correcting newly surfaced algorithmic deficiencies.

9.
Sci Rep ; 8(1): 17585, 2018 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-30498261

RESUMEN

A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has been fixed in the paper.

10.
Sci Rep ; 8(1): 14247, 2018 09 24.
Artículo en Inglés | MEDLINE | ID: mdl-30250218

RESUMEN

Imaging is a dominant strategy for data collection in neuroscience, yielding stacks of images that often scale to gigabytes of data for a single experiment. Machine learning algorithms from computer vision can serve as a pair of virtual eyes that tirelessly processes these images, automatically detecting and identifying microstructures. Unlike learning methods, our Flexible Learning-free Reconstruction of Imaged Neural volumes (FLoRIN) pipeline exploits structure-specific contextual clues and requires no training. This approach generalizes across different modalities, including serially-sectioned scanning electron microscopy (sSEM) of genetically labeled and contrast enhanced processes, spectral confocal reflectance (SCoRe) microscopy, and high-energy synchrotron X-ray microtomography (µCT) of large tissue volumes. We deploy the FLoRIN pipeline on newly published and novel mouse datasets, demonstrating the high biological fidelity of the pipeline's reconstructions. FLoRIN reconstructions are of sufficient quality for preliminary biological study, for example examining the distribution and morphology of cells or extracting single axons from functional data. Compared to existing supervised learning methods, FLoRIN is one to two orders of magnitude faster and produces high-quality reconstructions that are tolerant to noise and artifacts, as is shown qualitatively and quantitatively.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Imagenología Tridimensional/métodos , Aprendizaje Automático , Algoritmos , Animales , Ratones , Sincrotrones/instrumentación , Microtomografía por Rayos X/métodos
11.
Artículo en Inglés | MEDLINE | ID: mdl-30130187

RESUMEN

Prior art has shown it is possible to estimate, through image processing and computer vision techniques, the types and parameters of transformations that have been applied to the content of individual images to obtain new images. Given a large corpus of images and a query image, an interesting further step is to retrieve the set of original images whose content is present in the query image, as well as the detailed sequences of transformations that yield the query image given the original images. This is a problem that recently has received the name of image provenance analysis. In these times of public media manipulation (e.g., fake news and meme sharing), obtaining the history of image transformations is relevant for fact checking and authorship verification, among many other applications. This article presents an end-to-end processing pipeline for image provenance analysis, which works at real-world scale. It employs a cutting-edge image filtering solution that is custom-tailored for the problem at hand, as well as novel techniques for obtaining the provenance graph that expresses how the images, as nodes, are ancestrally connected. A comprehensive set of experiments for each stage of the pipeline is provided, comparing the proposed solution with state-of-the-art results, employing previously published datasets. In addition, this work introduces a new dataset of real-world provenance cases from the social media site Reddit, along with baseline results.

12.
Sci Rep ; 8(1): 5397, 2018 03 29.
Artículo en Inglés | MEDLINE | ID: mdl-29599461

RESUMEN

Machine learning is a field of computer science that builds algorithms that learn. In many cases, machine learning algorithms are used to recreate a human ability like adding a caption to a photo, driving a car, or playing a game. While the human brain has long served as a source of inspiration for machine learning, little effort has been made to directly use data collected from working brains as a guide for machine learning algorithms. Here we demonstrate a new paradigm of "neurally-weighted" machine learning, which takes fMRI measurements of human brain activity from subjects viewing images, and infuses these data into the training process of an object recognition learning algorithm to make it more consistent with the human brain. After training, these neurally-weighted classifiers are able to classify images without requiring any additional neural data. We show that our neural-weighting approach can lead to large performance gains when used with traditional machine vision features, as well as to significant improvements with already high-performing convolutional neural network features. The effectiveness of this approach points to a path forward for a new class of hybrid machine learning algorithms which take both inspiration and direct constraints from neuronal data.


Asunto(s)
Encéfalo/fisiología , Aprendizaje Automático , Encéfalo/diagnóstico por imagen , Humanos , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética
13.
IEEE Trans Pattern Anal Mach Intell ; 40(3): 762-768, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-28541894

RESUMEN

It is often desirable to be able to recognize when inputs to a recognition function learned in a supervised manner correspond to classes unseen at training time. With this ability, new class labels could be assigned to these inputs by a human operator, allowing them to be incorporated into the recognition function-ideally under an efficient incremental update mechanism. While good algorithms that assume inputs from a fixed set of classes exist, e.g. , artificial neural networks and kernel machines, it is not immediately obvious how to extend them to perform incremental learning in the presence of unknown query classes. Existing algorithms take little to no distributional information into account when learning recognition functions and lack a strong theoretical foundation. We address this gap by formulating a novel, theoretically sound classifier-the Extreme Value Machine (EVM). The EVM has a well-grounded interpretation derived from statistical Extreme Value Theory (EVT), and is the first classifier to be able to perform nonlinear kernel-free variable bandwidth incremental learning. Compared to other classifiers in the same deep network derived feature space, the EVM is accurate and efficient on an established benchmark partition of the ImageNet dataset.

14.
Elife ; 52016 07 07.
Artículo en Inglés | MEDLINE | ID: mdl-27383271

RESUMEN

Resolving patterns of synaptic connectivity in neural circuits currently requires serial section electron microscopy. However, complete circuit reconstruction is prohibitively slow and may not be necessary for many purposes such as comparing neuronal structure and connectivity among multiple animals. Here, we present an alternative strategy, targeted reconstruction of specific neuronal types. We used viral vectors to deliver peroxidase derivatives, which catalyze production of an electron-dense tracer, to genetically identify neurons, and developed a protocol that enhances the electron-density of the labeled cells while retaining the quality of the ultrastructure. The high contrast of the marked neurons enabled two innovations that speed data acquisition: targeted high-resolution reimaging of regions selected from rapidly-acquired lower resolution reconstruction, and an unsupervised segmentation algorithm. This pipeline reduces imaging and reconstruction times by two orders of magnitude, facilitating directed inquiry of circuit motifs.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Microscopía Electrónica/métodos , Microtomía/métodos , Red Nerviosa/anatomía & histología , Vías Nerviosas/anatomía & histología , Retina/citología , Coloración y Etiquetado/métodos , Animales , Femenino , Masculino , Ratones Endogámicos C57BL
15.
IEEE Trans Pattern Anal Mach Intell ; 36(11): 2317-24, 2014 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26353070

RESUMEN

Real-world tasks in computer vision often touch upon open set recognition: multi-class recognition with incomplete knowledge of the world and many unknown inputs. Recent work on this problem has proposed a model incorporating an open space risk term to account for the space beyond the reasonable support of known classes. This paper extends the general idea of open space risk limiting classification to accommodate non-linear classifiers in a multiclass setting. We introduce a new open set recognition model called compact abating probability (CAP), where the probability of class membership decreases in value (abates) as points move from known data toward open space. We show that CAP models improve open set recognition for multiple algorithms. Leveraging the CAP formulation, we go on to describe the novel Weibull-calibrated SVM (W-SVM) algorithm, which combines the useful properties of statistical extreme value theory for score calibration with one-class and binary support vector machines. Our experiments show that the W-SVM is significantly better for open set object detection and OCR problems when compared to the state-of-the-art for the same tasks.

16.
IEEE Trans Pattern Anal Mach Intell ; 36(8): 1679-86, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26353347

RESUMEN

For many problems in computer vision, human learners are considerably better than machines. Humans possess highly accurate internal recognition and learning mechanisms that are not yet understood, and they frequently have access to more extensive training data through a lifetime of unbiased experience with the visual world. We propose to use visual psychophysics to directly leverage the abilities of human subjects to build better machine learning systems. First, we use an advanced online psychometric testing platform to make new kinds of annotation data available for learning. Second, we develop a technique for harnessing these new kinds of information-"perceptual annotations"-for support vector machines. A key intuition for this approach is that while it may remain infeasible to dramatically increase the amount of data and high-quality labels available for the training of a given system, measuring the exemplar-by-exemplar difficulty and pattern of errors of human annotators can provide important information for regularizing the solution of the system at hand. A case study for the problem face detection demonstrates that this approach yields state-of-the-art results on the challenging FDDB data set.


Asunto(s)
Aprendizaje Automático , Reconocimiento de Normas Patrones Automatizadas/métodos , Reconocimiento Visual de Modelos/fisiología , Curaduría de Datos , Bases de Datos Factuales , Cara/anatomía & histología , Femenino , Humanos , Masculino , Psicofísica , Máquina de Vectores de Soporte
17.
IEEE Trans Pattern Anal Mach Intell ; 35(7): 1757-72, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23682001

RESUMEN

To date, almost all experimental evaluations of machine learning-based recognition algorithms in computer vision have taken the form of "closed set" recognition, whereby all testing classes are known at training time. A more realistic scenario for vision applications is "open set" recognition, where incomplete knowledge of the world is present at training time, and unknown classes can be submitted to an algorithm during testing. This paper explores the nature of open set recognition and formalizes its definition as a constrained minimization problem. The open set recognition problem is not well addressed by existing algorithms because it requires strong generalization. As a step toward a solution, we introduce a novel "1-vs-set machine," which sculpts a decision space from the marginal distances of a 1-class or binary SVM with a linear kernel. This methodology applies to several different applications in computer vision where open set recognition is a challenging problem, including object recognition and face verification. We consider both in this work, with large scale cross-dataset experiments performed over the Caltech 256 and ImageNet sets, as well as face matching experiments performed over the Labeled Faces in the Wild set. The experiments highlight the effectiveness of machines adapted for open set evaluation compared to existing 1-class and binary SVMs for the same tasks.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Máquina de Vectores de Soporte , Animales , Identificación Biométrica , Humanos
18.
IEEE Trans Pattern Anal Mach Intell ; 33(8): 1689-95, 2011 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-21422483

RESUMEN

In this paper, we define meta-recognition, a performance prediction method for recognition algorithms, and examine the theoretical basis for its postrecognition score analysis form through the use of the statistical extreme value theory (EVT). The ability to predict the performance of a recognition system based on its outputs for each match instance is desirable for a number of important reasons, including automatic threshold selection for determining matches and nonmatches, and automatic algorithm selection or weighting for multi-algorithm fusion. The emerging body of literature on postrecognition score analysis has been largely constrained to biometrics, where the analysis has been shown to successfully complement or replace image quality metrics as a predictor. We develop a new statistical predictor based upon the Weibull distribution, which produces accurate results on a per instance recognition basis across different recognition problems. Experimental results are provided for two different face recognition algorithms, a fingerprint recognition algorithm, a SIFT-based object recognition system, and a content-based image retrieval system.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...